Development of a Real-time Asr System for Slovak Speechdat Database

نویسندگان

  • Milos Cernak
  • Marian Trnka
چکیده

This paper describes development of a real-time speech recognition system in Slovak for the voice-operated telephone services. The system is based on SPHINX2 platform. The decoder using Hidden Markov Models was trained on the SpeechDat-E Slovak database. It is speaker independent, large vocabulary, continuous speech real-time automatic speech recognition system. Test results are given for the test groups of isolated digits, connected isolated digits, application words, phonetically rich words, and city (plus proper) names. Achieved word error rates are in the interval from 3.73% (connected isolated digits, vocabulary of 11 words) to 15.72% (city names, vocabulary of 927 words).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recording of Czech and Slovak Telephone Databases within SpeechDat-E

The databases of 5 East-European languages: Czech, Slovak, Russian, Polish and Hungarian are being created within the SpeechDat-E project. This paper describes the overall design of SpeechDat-E databases and concentrates on the Czech (1000 speakers) and Slovak (1000 speakers). The item structure and recording speci cations are presented. More detailed description is included for the language-sp...

متن کامل

Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque

This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both datab...

متن کامل

Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases

This paper presents the work on crosslingual and bilingual speech recognition carried out with SpeechDat databases for Czech and Slovak language. The work follows the MASPER initiative that was formed as a part of the COST 278 Action. In crosslingual experiments the expert-driven and the datadriven approaches were used for transferring monolingual source acoustic models to a target language. Th...

متن کامل

SpeechDat(E) - Eastern European Telephone Speech Databases

This paper describes the creation of five new telephony speech databases for Central and Eastern European languages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-II specifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlier Speec...

متن کامل

Speechdat-e: five eastern european speech databases for voice-operated teleservices completed

In the Speechdat-E project five medium large telephone speech databases have been collected for Czech, Hungarian, Polish, Russian, and Slovak. The project was recently concluded. This paper reports briefly on the contents of the databases, elaborates on experiences gained from the data recordings and from the validation of the databases. The availability of the databases to the public is addres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005